Analysis of microbial composition and sharing in low-biomass human milk samples: a comparison of DNA isolation and sequencing techniques

Human milk microbiome studies are currently hindered by low milk bacterial/human cell ratios and often rely on 16S rRNA gene sequencing, which limits downstream analyses. Here, we aimed to find a method to study milk bacteria and assess bacterial sharing between maternal and infant microbiota. We tested four DNA isolation methods, two bacterial enrichment methods and three sequencing methods on mock communities, milk samples and negative controls. Of the four DNA isolation kits, the DNeasy PowerSoil Pro (PS) and MagMAX Total Nucleic Acid Isolation (MX) kits provided consistent 16S rRNA gene sequencing results with low contamination. Neither enrichment method substantially decreased the human metagenomic sequencing read-depth. Long-read 16S-ITS-23S rRNA gene sequencing biased the mock community composition but provided consistent results for milk samples, with little contamination. In contrast to 16S rRNA gene sequencing, 16S-ITS-23S rRNA gene sequencing of milk, infant oral, infant faecal and maternal faecal DNA from 14 mother-infant pairs provided sufficient resolution to detect significantly more frequent sharing of bacteria between related pairs compared to unrelated pairs. In conclusion, PS or MX kit-DNA isolation followed by 16S rRNA gene sequencing reliably characterises human milk microbiota, and 16S-ITS-23S rRNA gene sequencing enables studies of bacterial transmission in low-biomass samples.


Quantitative PCR
BaseClear performed a quantitative PCR (qPCR) assay targeting the 16S rRNA gene to estimate the total bacterial DNA content of samples 1 .Each 10 µl reaction consisted of 5 µl ABsolute TM Blue qPCR Mix (low ROX, AB4318B, Thermo Fisher), 0. Netherlands).The slope of the standard curve was -3.52, the y-intercept was 38.8, the efficiency was 92.3% and R 2 was 0.998.All samples, standards and no template controls were processed in triplicates in a single qPCR assay.The triplicates for the no template control had Cq values of 36.0, 36.9 and >40, while PS kit-DNA isolation negative controls had Cq values ≥36.4.The coefficient of variation for sample replicates was ≤1.2%.Results were normalised per 1 ml human milk sample input for DNA isolation or per 1 ng DNA used as input for the qPCR assay.

Bacterial enrichment
We tested two bacterial enrichment methods on the method test samples.Both hypotonically lysed human cells and degraded freed DNA prior to DNA isolation.Milk and negative control samples were prepared in triplicate with each enrichment method, while the bacterial mock community was prepared once with each method.

Hypotonic lysis and DNA degradation with benzonase endonuclease
The hypotonic lysis and benzonase treatment (HL-Benz) method was adapted from Nelson et al. 2019 2 , who successfully used it for reducing human host contamination in complex respiratory samples prior to DNA isolation and metagenomic sequencing.
For hypotonic lysis, 5.5 ml filtered DEPC-treated water was added to 3.5 ml milk and negative controls, respectively.For the positive control, 8.9 ml filtered DEPC-treated water were added to 75 µl bacterial mock community sample.Samples were incubated with gentle agitation at room temperature (20 to 25°C) for 1 h.
Samples were centrifuged at 8 000 x g for 10 min to pellet bacterial cells, and fat and supernatants were removed.Bacterial pellets were resuspended in 1 ml phosphate-buffered saline (PBS, P5119, Sigma-Aldrich), moved to 2-ml tubes, centrifuged at 8 000 x g for 10 min, and supernatants were removed.Pellets were then resuspended in 1 ml Tris-EDTA, pH 8.0, (TE, 93283, Sigma-Aldrich) and 10 µl of 0.5 M EDTA, pH 8.0, (15575-038, Invitrogen) were added to inactivate the benzonase.Afterwards, samples were again centrifuged at 8 000 x g for 10 min and bacterial pellets were washed by addition of 1 ml PBS, centrifugation at 8 000 x g for 10 min, and removal of supernatants.Finally, bacterial pellets were resuspended in 175 µl PBS and stored at -20°C until DNA extraction.

Hypotonic lysis and DNA degradation with propidium monoazide and light
The hypotonic lysis and propidium monoazide treatment (HL-PMA) method was adapted from Marotz et al. 3 , who successfully used it to remove human host cell DNA from saliva samples prior to metagenomic sequencing.
For hypotonic lysis, 10.5 ml filtered DEPC-treated water was added to 3.5 ml milk and negative controls, respectively.For the positive control, 13.9 ml filtered DEPC-treated water were added to 75 µl bacterial mock community sample.Samples were incubated with gentle agitation at room temperature for 1h.
Samples were centrifuged at 9 400 x g for 10 min, and fat and supernatants were removed.
Pellets were resuspended in 175 µl filtered DEPC-treated water and moved to 2-ml tubes.To inactivate freed DNA, 1.75 µl of 1 mM propidium monoazide (PMA, 40013, Biotium, Fremont, California, USA) were added, samples were briefly vortexed, and incubated in the dark at room temperature for 5 min.Then, the sample tubes were placed horizontally on ice 15 cm below a blue lamp (Paulmann 128970 PMQ 11/18 230V BLUE 1W, Paulmann, Springe-Völksen, Germany) and incubated for 50 min with gentle agitation.Samples were stored at -20°C until DNA extraction.

DNA isolation and quantification
After bacterial enrichment, samples were used for DNA isolation with the PS kit and Qubitbased DNA quantification as described in the Methods section.DNA eluates were stored at -20ºC until further processing.

Shotgun metagenomic sequencing and data processing
DNA was sent to the Institute of Clinical Molecular Biology at the Christian-Albrecht University of Kiel in Germany for shotgun metagenomic sequencing.Sequencing libraries were prepared using the NEBNext Ultra II DNA Library Prep Kit for Illumina (E7645L, NEB, Ipswich, Massachusetts, USA) and sequenced on a NovaSeq 6000 (2x 150 bp, Illumina).
Shotgun metagenomic sequencing data was processed using an approach previously described for the Lifelines Dutch Microbiome Project  DNA was isolated from bacterial mock communities using four DNA isolation kits, and samples were sequenced using 16S rRNA gene sequencing.Bacterial taxonomies were assigned using the SILVA database.The plot shows the relative abundances of bacterial species in mock communities, including bacterial mock communities isolated with the four DNA isolation kits, a DNA mock community (DNA-Mock, added at the start of library preparation) and a theoretical mock community (Theor-Mock, see Table S1).DNA was isolated from bacterial mock communities using the DNeasy PowerSoil Pro kit, and samples were sequenced using 16S and 16S-ITS-23S rRNA gene sequencing.Bacterial taxonomies were assigned using the FANGORN database.The plot shows the relative abundances of bacterial species in the sequenced mock communities and a theoretical mock community (Theor-Mock, see Table S1).Mock communities are clustered based on Aitchison distances.DNA was isolated from human milk, infant oral swabs, infant faeces, and maternal faeces from 14 Lifelines NEXT mother-infant pairs using the DNeasy PowerSoil Pro kit.Samples were sequenced using 16S and 16S-ITS-23S rRNA gene sequencing and bacterial taxonomies were assigned using the FANGORN database.The plot shows the relative abundances of classified bacterial genera present in at least 14 samples with at least 2% relative abundance in any

Figure S1 :
Figure S1: Mock community composition by DNA isolation method.

Figure S2 :Figure S3 :
Figure S2: Bacterial genera in negative controls by DNA isolation method.DNA was isolated from filtered DEPC-treated water using four DNA isolation kits and samples were sequenced using 16S rRNA gene sequencing.Bacterial taxonomies were assigned using the SILVA database.The plot shows the relative abundances of bacterial genera.Only classified bacterial genera present in at least two samples with at least 2% relative abundance in any sample are shown.Genera present at lower prevalence and abundance or without genuslevel classification are summarised in 'Other'.Numbers above the bars indicate the number of sequencing reads of the respective sample.NC = Negative control, FS = QIAamp Fast DNA Stool Mini kit, MD = Milk Bacterial DNA Isolation kit, PS = DNeasy PowerSoil Pro kit, MX = MagMAX Total Nucleic Acid Isolation kit.

Figure S4 :
Figure S4: Total bacterial load by sample type.DNA was isolated from human milk, infant oral swabs, infant faeces, and maternal faeces from 14 Lifelines NEXT mother-infant pairs using the DNeasy PowerSoil Pro kit.Total bacterial load (16S rRNA gene copy numbers) was assessed using quantitative PCR (qPCR), normalised to 1 ng sample DNA and log-transformed for plotting.The boxplots show medians with 25% and 75% percentiles and 1.5x the interquartile range.The total bacterial load differed significantly between sample types (Kruskal-Wallis test with Dunn's posthoc test).Statistical significance: **FDR<0.01,***FDR<0.001.Note that sample input for DNA isolation differed between sample types, i.e., human milk DNA was isolated from 3.5 ml human milk, maternal

Figure S5 :
Figure S5: Bacterial composition by sequencing method and sample type.

Figure S6 :
Figure S6: Genera richness by sequencing method and sample type.DNA was isolated from human milk, infant oral swabs, infant faeces, and maternal faeces from 14 Lifelines NEXT mother-infant pairs using the DNeasy PowerSoil Pro kit.Samples were sequenced using 16S and 16S-ITS-23S rRNA gene sequencing and bacterial taxonomies were assigned using the FANGORN database.The plot shows the number of classified bacterial genera (genera richness) for each sample type and sequencing method.The boxplots represent medians with 25% and 75% percentiles and 1.5x the interquartile range.One maternal faecal sample failed 16S rRNA gene sequencing.Statistically significant results (Mann-Whitney U tests): **FDR<0.01,***FDR<0.001.

Figure S7 :
Figure S7: Bacterial genera detected in human milk and other maternal and infant sample types.DNA was isolated from human milk, infant oral swabs, infant faeces, and maternal faeces from 14 Lifelines NEXT mother-infant pairs using the DNeasy PowerSoil Pro kit.Samples were sequenced using 16S and 16S-ITS-23S rRNA gene sequencing and bacterial taxonomies were assigned using the FANGORN database.One maternal faecal sample failed 16S rRNA gene sequencing.The bars show the number of pairs, which have the indicated genus detected both in milk and (A) the infant oral cavity, (B) infant faeces, and (C) maternal faeces.Only genera shared in at least one sample-pair are shown.The genera detected in at least two pairs with both sequencing methods (highlighted in bold and with darker colour) were investigated more closely for bacterial sharing based on exact ASV matching (see Figure 3C).

Figure S8 :
Figure S8: Streptococcus phylogenetic tree based on 16S rRNA gene sequencing data.DNA was isolated from human milk, infant oral swabs, infant faeces, and maternal faeces from 14 Lifelines NEXT mother-infant pairs using the DNeasy PowerSoil Pro kit.Samples were sequenced using 16S rRNA gene sequencing and bacterial taxonomies were assigned using the FANGORN database.The maternal faecal sample from pair 5 failed 16S rRNA gene sequencing.The plot shows a phylogenetic tree for Streptococcus based on 16S rRNA gene sequencing data.Dot shape indicates sample type.Dot colour indicates the mother-infant pair ID.Phylogenetic distances were log-transformed for plotting.